Libraries

#Here are a bunch of libraries that you may or may not need


library(agricolae) #used for simple effects
library(apaTables) #makes apa-formatted tables
library(broom) #converts objects so they can be used by tidy and ggplot
library(devtools) #tbh i dont know, but i think it helps with other packages
library(e1071) # for skew stats
library(effects) #glm stuff, i think
library(ez) #anovas, rm anovas
library(GGally) #addon for GGplot2
library(ggmap) #maps
library(ggplot2) #the prettiest graphs you ever did see
library(gridExtra) #nice tables for grobs
library(hexbin) #density graphs 
library(HH) #i dont remember!! 
library(Hmisc) #arranging grobs, writing things as.numeric etc.
library(kableExtra) #tables for html
library(knitr) #lets you do extra stuff when you knit to html
library(lme4) #linear mixed effects models
library(lsr) #used for etaSquared 
library(MBESS) #needed for some apa tables
library(multcomp) # i don't remember!!! 
library(nlme) #more linear model stuff (and nonlinear)
library(plyr) #for splitting, combining data 
library(psych) #basically all the most basic stats in r 
library(purrr) #more layout/visual stuff for grobs
library(RColorBrewer) #makes everything beautiful and colorful
library(readxl) #lets you import from excel
library(reshape2) #helps you reshape data, put in longform, etc
library(rgeos) #maps
library(rnaturalearth) #maps
library(rnaturalearthdata) #maps 
library(scales) #for making those color mapped correlations 
library(sf) #maps
library(stargazer) #makes nice tables for sweave,html
library(texreg) #puts output to html tables 
library(tibble) #colnames and rownames. also good for grobs
library(tidyr) #more data manipulation stuff
library(xtable) # i dont remember, again! 
library(lmerTest) 
library(robustlmm)
library(WRS) # Rand Wilcox's package for robust stats
library(WRS2)

library(esc) #effect size calculators

#for whatever reasaon, this package needs to be loaded last
library(dplyr) #helps with summary tables

R Basics

In its most simple form, you can use R as a glorified calculator. TO run a line, hit “COMMAND ENTER”. To run a whole chunk, you can press the green play button in the top right.

Arithmetic

## [1] 3
## [1] 3
## [1] 6

Order of operations

## [1] 12.68268

Creating lists. Use “c”

how long is your list?

## [1] 8

indexing- grab a single value from a list, or multiple values

## [1] 1
## [1]  2 13  6  7  8
## [1]  5  4  1 13  6  7  8

rbind vs. cbind: what is the difference?

##  [1]  1  2  3  4  5  6  7  8  9 10
##  [1]  1  2  3  4  5  6  7  8  9 10
##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## a  300    4    1    2   13    6    7    8    5    20
## b    1    2    3    4    5    6    7    8    9    10
##         a  b
##  [1,] 300  1
##  [2,]   4  2
##  [3,]   1  3
##  [4,]   2  4
##  [5,]  13  5
##  [6,]   6  6
##  [7,]   7  7
##  [8,]   8  8
##  [9,]   5  9
## [10,]  20 10

Data Manipulation

You gotta gotta gotta know how to manipulate your data, or else you will spend HOURS trying to figure out why your stats aren’t running and why your graphs look like abstract art.

Data Import

Note: this dataset is for the Flanker Fish task from the executive function battery. It includes: -3 groups (M,S,C) -2 Years -3 Blocks/Year -3 Conditions/Block -Accuracy, RT

Additional variables are things like gender, age, WASI, etc. that we will use in different tests.

Again, this is fake data!!!!!

Data at a Glance

Something pretty important in R is being able to look at your data quickly.

## # A tibble: 6 x 55
##   ID    Group   Age Gender  WASI Y4_Total_AccSta… Y4_Incongruent_…
##   <chr> <dbl> <dbl> <chr>  <dbl>            <dbl>            <dbl>
## 1 MUSI…     1    83 M        108           0.582             0.980
## 2 MUSI…     1    80 F        128           0.748             0.279
## 3 MUSI…     1    88 F         66           0.927             0.843
## 4 MUSI…     1    81 F        113           0.681             0.777
## 5 MUSI…     1    78 M        105           0.0506            0.391
## 6 MUSI…     1    79 M         80           0.834             0.643
## # … with 48 more variables: Y4_Neutral_AccStandard <dbl>,
## #   Y4_Congruent_AccStandard <dbl>, Y4_Incongruent_RTStandard <dbl>,
## #   Y4_Neutral_RTStandard <dbl>, Y4_Congruent_RTStandard <dbl>,
## #   Y4_Total_RTStandard <dbl>, Y4_acc_Reversed <dbl>,
## #   Y4_acc_I_Reversed <dbl>, Y4_acc_N_Reversed <dbl>,
## #   Y4_acc_C_Reversed <dbl>, Y4_RT_I_Reversed <dbl>,
## #   Y4_RT_N_Reversed <dbl>, Y4_RT_C_Reversed <dbl>, Y4_RT_Reversed <dbl>,
## #   Y4_acc_Mixed <dbl>, Y4_acc_I_Mixed <dbl>, Y4_acc_N_Mixed <dbl>,
## #   Y4_acc_C_Mixed <dbl>, Y4_RT_I_Mixed <dbl>, Y4_RT_N_Mixed <dbl>,
## #   Y4_RT_C_Mixed <dbl>, Y4_RT_Mixed <dbl>, Y4_Switch_Cost_ <dbl>,
## #   Y5_Total_AccStandard <dbl>, Y5_Incongruent_AccStandard <dbl>,
## #   Y5_Neutral_AccStandard <dbl>, Y5_Congruent_AccStandard <dbl>,
## #   Y5_Incongruent_RTStandard <dbl>, Y5_Neutral_RTStandard <dbl>,
## #   Y5_Congruent_RTStandard <dbl>, Y5_Total_RTStandard <dbl>,
## #   Y5_acc_Reversed <dbl>, Y5_acc_I_Reversed <dbl>,
## #   Y5_acc_N_Reversed <dbl>, Y5_acc_C_Reversed <dbl>,
## #   Y5_RT_I_Reversed <dbl>, Y5_RT_N_Reversed <dbl>,
## #   Y5_RT_C_Reversed <dbl>, Y5_RT_Reversed <dbl>, Y5_acc_Mixed <dbl>,
## #   Y5_acc_I_Mixed <dbl>, Y5_acc_N_Mixed <dbl>, Y5_acc_C_Mixed <dbl>,
## #   Y5_RT_I_Mixed <dbl>, Y5_RT_N_Mixed <dbl>, Y5_RT_C_Mixed <dbl>,
## #   Y5_RT_Mixed <dbl>, Y5_Switch_Cost <dbl>
##  [1] "ID"                         "Group"                     
##  [3] "Age"                        "Gender"                    
##  [5] "WASI"                       "Y4_Total_AccStandard"      
##  [7] "Y4_Incongruent_AccStandard" "Y4_Neutral_AccStandard"    
##  [9] "Y4_Congruent_AccStandard"   "Y4_Incongruent_RTStandard" 
## [11] "Y4_Neutral_RTStandard"      "Y4_Congruent_RTStandard"   
## [13] "Y4_Total_RTStandard"        "Y4_acc_Reversed"           
## [15] "Y4_acc_I_Reversed"          "Y4_acc_N_Reversed"         
## [17] "Y4_acc_C_Reversed"          "Y4_RT_I_Reversed"          
## [19] "Y4_RT_N_Reversed"           "Y4_RT_C_Reversed"          
## [21] "Y4_RT_Reversed"             "Y4_acc_Mixed"              
## [23] "Y4_acc_I_Mixed"             "Y4_acc_N_Mixed"            
## [25] "Y4_acc_C_Mixed"             "Y4_RT_I_Mixed"             
## [27] "Y4_RT_N_Mixed"              "Y4_RT_C_Mixed"             
## [29] "Y4_RT_Mixed"                "Y4_Switch_Cost_"           
## [31] "Y5_Total_AccStandard"       "Y5_Incongruent_AccStandard"
## [33] "Y5_Neutral_AccStandard"     "Y5_Congruent_AccStandard"  
## [35] "Y5_Incongruent_RTStandard"  "Y5_Neutral_RTStandard"     
## [37] "Y5_Congruent_RTStandard"    "Y5_Total_RTStandard"       
## [39] "Y5_acc_Reversed"            "Y5_acc_I_Reversed"         
## [41] "Y5_acc_N_Reversed"          "Y5_acc_C_Reversed"         
## [43] "Y5_RT_I_Reversed"           "Y5_RT_N_Reversed"          
## [45] "Y5_RT_C_Reversed"           "Y5_RT_Reversed"            
## [47] "Y5_acc_Mixed"               "Y5_acc_I_Mixed"            
## [49] "Y5_acc_N_Mixed"             "Y5_acc_C_Mixed"            
## [51] "Y5_RT_I_Mixed"              "Y5_RT_N_Mixed"             
## [53] "Y5_RT_C_Mixed"              "Y5_RT_Mixed"               
## [55] "Y5_Switch_Cost"

To check how many rows/columns you have, you can use the length function

## [1] 51
## [1] 55

Also important is the class of your variables.

Class is imporant because it determines what kinds of stuff you can do with a variable. Factoring gets R to recoognize something as categorical, for example.

To change the class of a variable, simply write:

## [1] "character"
## $ID
## [1] "character"
## 
## $Group
## [1] "numeric"
## 
## $Age
## [1] "numeric"
## 
## $Gender
## [1] "character"
## 
## $WASI
## [1] "numeric"
## 
## $Y4_Total_AccStandard
## [1] "numeric"
## 
## $Y4_Incongruent_AccStandard
## [1] "numeric"
## 
## $Y4_Neutral_AccStandard
## [1] "numeric"
## 
## $Y4_Congruent_AccStandard
## [1] "numeric"
## 
## $Y4_Incongruent_RTStandard
## [1] "numeric"
## 
## $Y4_Neutral_RTStandard
## [1] "numeric"
## 
## $Y4_Congruent_RTStandard
## [1] "numeric"
## 
## $Y4_Total_RTStandard
## [1] "numeric"
## 
## $Y4_acc_Reversed
## [1] "numeric"
## 
## $Y4_acc_I_Reversed
## [1] "numeric"
## 
## $Y4_acc_N_Reversed
## [1] "numeric"
## 
## $Y4_acc_C_Reversed
## [1] "numeric"
## 
## $Y4_RT_I_Reversed
## [1] "numeric"
## 
## $Y4_RT_N_Reversed
## [1] "numeric"
## 
## $Y4_RT_C_Reversed
## [1] "numeric"
## 
## $Y4_RT_Reversed
## [1] "numeric"
## 
## $Y4_acc_Mixed
## [1] "numeric"
## 
## $Y4_acc_I_Mixed
## [1] "numeric"
## 
## $Y4_acc_N_Mixed
## [1] "numeric"
## 
## $Y4_acc_C_Mixed
## [1] "numeric"
## 
## $Y4_RT_I_Mixed
## [1] "numeric"
## 
## $Y4_RT_N_Mixed
## [1] "numeric"
## 
## $Y4_RT_C_Mixed
## [1] "numeric"
## 
## $Y4_RT_Mixed
## [1] "numeric"
## 
## $Y4_Switch_Cost_
## [1] "numeric"
## 
## $Y5_Total_AccStandard
## [1] "numeric"
## 
## $Y5_Incongruent_AccStandard
## [1] "numeric"
## 
## $Y5_Neutral_AccStandard
## [1] "numeric"
## 
## $Y5_Congruent_AccStandard
## [1] "numeric"
## 
## $Y5_Incongruent_RTStandard
## [1] "numeric"
## 
## $Y5_Neutral_RTStandard
## [1] "numeric"
## 
## $Y5_Congruent_RTStandard
## [1] "numeric"
## 
## $Y5_Total_RTStandard
## [1] "numeric"
## 
## $Y5_acc_Reversed
## [1] "numeric"
## 
## $Y5_acc_I_Reversed
## [1] "numeric"
## 
## $Y5_acc_N_Reversed
## [1] "numeric"
## 
## $Y5_acc_C_Reversed
## [1] "numeric"
## 
## $Y5_RT_I_Reversed
## [1] "numeric"
## 
## $Y5_RT_N_Reversed
## [1] "numeric"
## 
## $Y5_RT_C_Reversed
## [1] "numeric"
## 
## $Y5_RT_Reversed
## [1] "numeric"
## 
## $Y5_acc_Mixed
## [1] "numeric"
## 
## $Y5_acc_I_Mixed
## [1] "numeric"
## 
## $Y5_acc_N_Mixed
## [1] "numeric"
## 
## $Y5_acc_C_Mixed
## [1] "numeric"
## 
## $Y5_RT_I_Mixed
## [1] "numeric"
## 
## $Y5_RT_N_Mixed
## [1] "numeric"
## 
## $Y5_RT_C_Mixed
## [1] "numeric"
## 
## $Y5_RT_Mixed
## [1] "numeric"
## 
## $Y5_Switch_Cost
## [1] "numeric"

Attributes is basically everything you might want to know about how R recognizes your data. You can do it for an entire dataframe, or for a single variable

## NULL

Factoring

Factoring is necessary in order for R to recognize things as different categories rather than individual characters, numbers, etc.

##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3
## [36] 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2
## Levels: 1 2 3
## [1] "factor"
## $levels
## [1] "1" "2" "3"
## 
## $class
## [1] "factor"
## $names
##  [1] "ID"                         "Group"                     
##  [3] "Age"                        "Gender"                    
##  [5] "WASI"                       "Y4_Total_AccStandard"      
##  [7] "Y4_Incongruent_AccStandard" "Y4_Neutral_AccStandard"    
##  [9] "Y4_Congruent_AccStandard"   "Y4_Incongruent_RTStandard" 
## [11] "Y4_Neutral_RTStandard"      "Y4_Congruent_RTStandard"   
## [13] "Y4_Total_RTStandard"        "Y4_acc_Reversed"           
## [15] "Y4_acc_I_Reversed"          "Y4_acc_N_Reversed"         
## [17] "Y4_acc_C_Reversed"          "Y4_RT_I_Reversed"          
## [19] "Y4_RT_N_Reversed"           "Y4_RT_C_Reversed"          
## [21] "Y4_RT_Reversed"             "Y4_acc_Mixed"              
## [23] "Y4_acc_I_Mixed"             "Y4_acc_N_Mixed"            
## [25] "Y4_acc_C_Mixed"             "Y4_RT_I_Mixed"             
## [27] "Y4_RT_N_Mixed"              "Y4_RT_C_Mixed"             
## [29] "Y4_RT_Mixed"                "Y4_Switch_Cost_"           
## [31] "Y5_Total_AccStandard"       "Y5_Incongruent_AccStandard"
## [33] "Y5_Neutral_AccStandard"     "Y5_Congruent_AccStandard"  
## [35] "Y5_Incongruent_RTStandard"  "Y5_Neutral_RTStandard"     
## [37] "Y5_Congruent_RTStandard"    "Y5_Total_RTStandard"       
## [39] "Y5_acc_Reversed"            "Y5_acc_I_Reversed"         
## [41] "Y5_acc_N_Reversed"          "Y5_acc_C_Reversed"         
## [43] "Y5_RT_I_Reversed"           "Y5_RT_N_Reversed"          
## [45] "Y5_RT_C_Reversed"           "Y5_RT_Reversed"            
## [47] "Y5_acc_Mixed"               "Y5_acc_I_Mixed"            
## [49] "Y5_acc_N_Mixed"             "Y5_acc_C_Mixed"            
## [51] "Y5_RT_I_Mixed"              "Y5_RT_N_Mixed"             
## [53] "Y5_RT_C_Mixed"              "Y5_RT_Mixed"               
## [55] "Y5_Switch_Cost"            
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50 51
## 
## $class
## [1] "tbl_df"     "tbl"        "data.frame"

Labelling things by group

This creates new dataframes. This can be helpful when running analysis, or just if you want to see summaries between groups very quickly

Changing the order of things

THERE IS A WRONG AND A RIGHT WAY TO DO THIS.

HERE IS THE WRONG WAY (I’m not even going to put it in real code because it is THAT BAD)

If you do the above, you are DOOMED to fail. You are just relabelling your data and swapping the NAMES of your factors, not the ORDER

HERE IS THE RIGHT WAY:

Changing Individual cells

## # A tibble: 1 x 1
##   Gender
##   <fct> 
## 1 M

Removing NA’s

NA’s cause lots of problems in analysis. Lots of stats will not run if you have NA’s.

## [1] 51
## [1] 48

Removing an entire row

## # A tibble: 51 x 55
##    ID    Group   Age Gender  WASI Y4_Total_AccSta… Y4_Incongruent_…
##    <chr> <fct> <dbl> <fct>  <dbl>            <dbl>            <dbl>
##  1 MUSI… Music    83 F        108           0.582             0.980
##  2 MUSI… Music    80 F        128           0.748             0.279
##  3 MUSI… Music    88 F         66           0.927             0.843
##  4 MUSI… Music    81 F        113           0.681             0.777
##  5 MUSI… Music    78 M        105           0.0506            0.391
##  6 MUSI… Music    79 M         80           0.834             0.643
##  7 MUSI… Music    86 F         93           0.846             0.071
##  8 MUSI… Music    86 F        105           0.961             0.361
##  9 MUSI… Music    81 F        154           0.340             0.686
## 10 MUSI… Music    84 F        142           0.0514            0.164
## # … with 41 more rows, and 48 more variables:
## #   Y4_Neutral_AccStandard <dbl>, Y4_Congruent_AccStandard <dbl>,
## #   Y4_Incongruent_RTStandard <dbl>, Y4_Neutral_RTStandard <dbl>,
## #   Y4_Congruent_RTStandard <dbl>, Y4_Total_RTStandard <dbl>,
## #   Y4_acc_Reversed <dbl>, Y4_acc_I_Reversed <dbl>,
## #   Y4_acc_N_Reversed <dbl>, Y4_acc_C_Reversed <dbl>,
## #   Y4_RT_I_Reversed <dbl>, Y4_RT_N_Reversed <dbl>,
## #   Y4_RT_C_Reversed <dbl>, Y4_RT_Reversed <dbl>, Y4_acc_Mixed <dbl>,
## #   Y4_acc_I_Mixed <dbl>, Y4_acc_N_Mixed <dbl>, Y4_acc_C_Mixed <dbl>,
## #   Y4_RT_I_Mixed <dbl>, Y4_RT_N_Mixed <dbl>, Y4_RT_C_Mixed <dbl>,
## #   Y4_RT_Mixed <dbl>, Y4_Switch_Cost_ <dbl>, Y5_Total_AccStandard <dbl>,
## #   Y5_Incongruent_AccStandard <dbl>, Y5_Neutral_AccStandard <dbl>,
## #   Y5_Congruent_AccStandard <dbl>, Y5_Incongruent_RTStandard <dbl>,
## #   Y5_Neutral_RTStandard <dbl>, Y5_Congruent_RTStandard <dbl>,
## #   Y5_Total_RTStandard <dbl>, Y5_acc_Reversed <dbl>,
## #   Y5_acc_I_Reversed <dbl>, Y5_acc_N_Reversed <dbl>,
## #   Y5_acc_C_Reversed <dbl>, Y5_RT_I_Reversed <dbl>,
## #   Y5_RT_N_Reversed <dbl>, Y5_RT_C_Reversed <dbl>, Y5_RT_Reversed <dbl>,
## #   Y5_acc_Mixed <dbl>, Y5_acc_I_Mixed <dbl>, Y5_acc_N_Mixed <dbl>,
## #   Y5_acc_C_Mixed <dbl>, Y5_RT_I_Mixed <dbl>, Y5_RT_N_Mixed <dbl>,
## #   Y5_RT_C_Mixed <dbl>, Y5_RT_Mixed <dbl>, Y5_Switch_Cost <dbl>
## [1] 50

Outliers

Great. Now you have some nice data. But is it really what you want? What about outliers? Lets go ahead and figure out what those are and decide if we want to remove them.

Let’s do it by group.

Long Form

Ok. This is great and all, but you can’t really do very many stats on it, because it is in wide format.

Please note: this dataset was organized very carefully (only for the Standard block, ha ha), so that the Underscores separated the different variables. For example:

Y4_Total_AccStandard will create 3 rows: Year = 4, Condition= Total, Var = AccStandard

There are other ways to do this, like many things in R, but this appears to be the most efficient IF you set up your headers correctly


Another way to make things long, for more simple data, is to use melt. A little more work-intensive

MAJOR KEY!!!!!: BEFORE YOU RENAME YOUR COLUMNS, CHECK TO MAKE SURE THEY ARE IN THE RIGHT ORDER!!!!!! OTHERWISE, VAST CONFUSION MAY OCCUR

Removing incomplete data

Ok, but there is one more problem: we can’t really look at results across years with simple RMANOVA unless all the participants have data for each year. So, we remove those that don’t have all years. This can be done with the na. omit thing when in WIDE form, or it can be done by removing subjects that don’t have more than x number of iterations of their ID (if you already have long form data)

Let’s go back to our original dataset (flanker_long)

The “>” term should be one less than your max number of iterations. So, I want each participant to have 12 data points, in this case (4 for each of the 3 years (total, incongruent, congruent, neutral)). If someone has less than 12, they are OUT.


ONE LAST THING. the long data includes something called “total”. we don’t want that because that is just a sum of everythign else! lets remove it.

Statistics

Woohoo! We are now moving on to statistics! Below are some very basic stats tests you might want to run.

Choosing a Test

Which test do I use?? Excellent question. Here is a nice website with detail and R/SPSS code, that you can use to make those decisions.

Additionally, below are two types of decision trees that are helpful.

It should be noted that the images below outline pretty old methods of statistics that are not super robust, meaning you may compromise some statistical power and potentially get some misleading results. As noted above, this is not a statistics course, so we will not be going into the math behind such arguments. Please use the images below with caution, as a place to start. There are lots of other more robust methods to carry out all of these analyses. We recommend this book by Rand Wilcox.

Tests for Association

Tests for Differences

Descriptive Stats

You can also…. look at your summary spreadsheets!

##       ID                Group          Age        Gender       WASI      
##  Length:423         Music  :144   Min.   :77.00   F:189   Min.   : 56.0  
##  Class :character   Sport  :135   1st Qu.:80.00   M:234   1st Qu.: 89.0  
##  Mode  :character   Control:144   Median :82.00           Median :105.0  
##                                   Mean   :82.11           Mean   :103.7  
##                                   3rd Qu.:84.00           3rd Qu.:113.0  
##                                   Max.   :88.00           Max.   :154.0  
##  Year           Condition    AccStandard       RTStandard    
##  Y3:141   Congruent  :141   Min.   :0.0088   Min.   : 361.0  
##  Y4:141   Incongruent:141   1st Qu.:0.3846   1st Qu.: 528.0  
##  Y5:141   Neutral    :141   Median :0.6823   Median : 621.4  
##           Total      :  0   Mean   :0.6263   Mean   : 635.2  
##                             3rd Qu.:0.9020   3rd Qu.: 734.1  
##                             Max.   :1.0000   Max.   :1223.3
##                            vars  n   mean     sd median trimmed    mad
## ID*                           1 47    NaN     NA     NA     NaN     NA
## Group*                        2 47   2.00   0.83   2.00    2.00   1.48
## Age                           3 47  82.11   2.97  82.00   82.05   2.97
## Gender*                       4 47   1.55   0.50   2.00    1.56   0.00
## WASI                          5 47 103.68  19.44 105.00  103.26  16.31
## Y4_Total_AccStandard          6 47   0.53   0.33   0.63    0.54   0.41
## Y4_Incongruent_AccStandard    7 47   0.49   0.28   0.41    0.48   0.37
## Y4_Neutral_AccStandard        8 47   0.40   0.27   0.38    0.39   0.32
## Y4_Congruent_AccStandard      9 47   0.86   0.09   0.87    0.86   0.12
## Y4_Incongruent_RTStandard    10 47 678.49 157.56 670.83  664.31 126.27
## Y4_Neutral_RTStandard        11 47 566.64 116.73 545.71  560.36 122.84
## Y4_Congruent_RTStandard      12 47 631.51 117.30 632.25  628.67 117.13
## Y4_Total_RTStandard          13 47 658.97 133.12 654.75  651.15 130.84
## Y5_Total_AccStandard         14 47   0.52   0.22   0.54    0.52   0.25
## Y5_Incongruent_AccStandard   15 47   0.65   0.28   0.64    0.67   0.34
## Y5_Neutral_AccStandard       16 47   0.84   0.27   1.00    0.89   0.00
## Y5_Congruent_AccStandard     17 47   0.68   0.29   0.73    0.71   0.40
## Y5_Incongruent_RTStandard    18 47 616.70 113.10 587.67  604.62  85.74
## Y5_Neutral_RTStandard        19 47 498.91  99.73 469.71  485.86  59.52
## Y5_Congruent_RTStandard      20 47 577.40 108.93 563.25  565.63 102.67
## Y5_Total_RTStandard          21 47 593.89 106.20 574.88  582.42  84.42
## Y3_Total_AccStandard         22 47   0.52   0.26   0.54    0.52   0.30
## Y3_Incongruent_AccStandard   23 47   0.52   0.29   0.50    0.52   0.33
## Y3_Neutral_AccStandard*      24 47   0.65   0.32   0.67    0.68   0.38
## Y3_Congruent_AccStandard     25 47   0.54   0.27   0.58    0.54   0.28
## Y3_Incongruent_RTStandard    26 47 738.52  96.65 748.35  741.96 134.08
## Y3_Neutral_RTStandard        27 47 728.52 118.28 743.65  730.92 150.09
## Y3_Congruent_RTStandard      28 47 679.91 123.36 664.67  677.53 165.51
## Y3_Total_RTStandard          29 47 671.95 116.67 672.04  667.23 144.65
##                               min     max  range  skew kurtosis    se
## ID*                           Inf    -Inf   -Inf    NA       NA    NA
## Group*                       1.00    3.00   2.00  0.00    -1.59  0.12
## Age                         77.00   88.00  11.00  0.06    -1.01  0.43
## Gender*                      1.00    2.00   1.00 -0.21    -2.00  0.07
## WASI                        56.00  154.00  98.00  0.16     0.32  2.84
## Y4_Total_AccStandard         0.00    0.96   0.96 -0.32    -1.46  0.05
## Y4_Incongruent_AccStandard   0.07    0.96   0.90  0.18    -1.41  0.04
## Y4_Neutral_AccStandard       0.01    0.96   0.95  0.22    -1.22  0.04
## Y4_Congruent_AccStandard     0.71    1.00   0.29 -0.11    -1.23  0.01
## Y4_Incongruent_RTStandard  423.50 1223.33 799.83  1.08     1.67 22.98
## Y4_Neutral_RTStandard      361.00  856.43 495.43  0.49    -0.31 17.03
## Y4_Congruent_RTStandard    415.50  967.25 551.75  0.24    -0.12 17.11
## Y4_Total_RTStandard        425.56 1053.12 627.56  0.61     0.26 19.42
## Y5_Total_AccStandard         0.01    0.96   0.95  0.01    -0.66  0.03
## Y5_Incongruent_AccStandard   0.02    1.00   0.98 -0.46    -0.75  0.04
## Y5_Neutral_AccStandard       0.06    1.00   0.94 -1.53     0.86  0.04
## Y5_Congruent_AccStandard     0.01    1.00   0.99 -0.51    -0.82  0.04
## Y5_Incongruent_RTStandard  465.83 1016.50 550.67  1.29     1.96 16.50
## Y5_Neutral_RTStandard      371.29  836.29 465.00  1.40     1.67 14.55
## Y5_Congruent_RTStandard    399.25  910.25 511.00  1.11     1.25 15.89
## Y5_Total_RTStandard        439.69  974.62 534.94  1.36     2.26 15.49
## Y3_Total_AccStandard         0.08    0.98   0.89  0.08    -1.13  0.04
## Y3_Incongruent_AccStandard   0.03    0.99   0.96 -0.11    -1.10  0.04
## Y3_Neutral_AccStandard*      0.02    1.00   0.98 -0.55    -0.96  0.05
## Y3_Congruent_AccStandard     0.02    1.00   0.98 -0.12    -0.97  0.04
## Y3_Incongruent_RTStandard  548.19  898.65 350.46 -0.22    -1.17 14.10
## Y3_Neutral_RTStandard      510.27  899.53 389.26 -0.12    -1.30 17.25
## Y3_Congruent_RTStandard    500.02  890.86 390.84  0.10    -1.43 17.99
## Y3_Total_RTStandard        508.60  876.45 367.85  0.24    -1.28 17.02
##    vars  n  mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 47 82.11 2.97     82   82.05 2.97  77  88    11 0.06    -1.01 0.43
## 
##  Descriptive statistics by group 
## group: Music
##                            vars  n   mean     sd median trimmed    mad
## ID*                           1 16    NaN     NA     NA     NaN     NA
## Group*                        2 16   1.00   0.00   1.00    1.00   0.00
## Age                           3 16  82.31   3.11  82.50   82.29   3.71
## Gender*                       4 16   1.38   0.50   1.00    1.36   0.00
## WASI                          5 16 108.00  24.54 105.00  107.71  14.83
## Y4_Total_AccStandard          6 16   0.52   0.37   0.71    0.53   0.36
## Y4_Incongruent_AccStandard    7 16   0.47   0.29   0.39    0.46   0.36
## Y4_Neutral_AccStandard        8 16   0.35   0.22   0.34    0.34   0.22
## Y4_Congruent_AccStandard      9 16   0.88   0.09   0.90    0.88   0.09
## Y4_Incongruent_RTStandard    10 16 651.10 145.37 602.17  643.40  97.36
## Y4_Neutral_RTStandard        11 16 547.71 125.41 514.71  538.99  86.10
## Y4_Congruent_RTStandard      12 16 603.94 125.83 605.38  601.62 154.93
## Y4_Total_RTStandard          13 16 634.77 134.15 591.62  631.27  98.55
## Y5_Total_AccStandard         14 16   0.58   0.23   0.55    0.58   0.30
## Y5_Incongruent_AccStandard   15 16   0.80   0.32   1.00    0.84   0.00
## Y5_Neutral_AccStandard       16 16   0.80   0.27   1.00    0.82   0.00
## Y5_Congruent_AccStandard     17 16   0.97   0.09   1.00    0.98   0.00
## Y5_Incongruent_RTStandard    18 16 619.18 133.77 581.50  601.35 121.82
## Y5_Neutral_RTStandard        19 16 503.20 121.38 462.21  488.83  66.82
## Y5_Congruent_RTStandard      20 16 577.23 127.44 567.62  566.16  90.62
## Y5_Total_RTStandard          21 16 596.65 129.59 568.19  580.86  95.30
## Y3_Total_AccStandard         22 16   0.48   0.30   0.45    0.48   0.42
## Y3_Incongruent_AccStandard   23 16   0.65   0.26   0.68    0.67   0.21
## Y3_Neutral_AccStandard*      24 16   0.92   0.14   1.00    0.93   0.00
## Y3_Congruent_AccStandard     25 16   0.62   0.24   0.65    0.63   0.24
## Y3_Incongruent_RTStandard    26 16 750.66 102.78 770.23  756.51 116.85
## Y3_Neutral_RTStandard        27 16 723.42 119.32 750.25  723.69 142.80
## Y3_Congruent_RTStandard      28 16 683.47 131.26 661.24  681.57 177.69
## Y3_Total_RTStandard          29 16 671.29 127.65 686.88  667.94 166.71
##                               min     max  range  skew kurtosis    se
## ID*                           Inf    -Inf   -Inf    NA       NA    NA
## Group*                       1.00    1.00   0.00   NaN      NaN  0.00
## Age                         77.00   88.00  11.00  0.04    -1.13  0.78
## Gender*                      1.00    2.00   1.00  0.47    -1.89  0.12
## WASI                        66.00  154.00  88.00  0.32    -0.85  6.14
## Y4_Total_AccStandard         0.02    0.96   0.94 -0.24    -1.81  0.09
## Y4_Incongruent_AccStandard   0.07    0.96   0.89  0.24    -1.43  0.07
## Y4_Neutral_AccStandard       0.01    0.75   0.74  0.09    -1.04  0.05
## Y4_Congruent_AccStandard     0.71    0.98   0.27 -0.53    -1.29  0.02
## Y4_Incongruent_RTStandard  426.67  983.33 556.67  0.71    -0.26 36.34
## Y4_Neutral_RTStandard      361.00  856.43 495.43  0.86     0.06 31.35
## Y4_Congruent_RTStandard    415.50  824.75 409.25  0.01    -1.44 31.46
## Y4_Total_RTStandard        425.56  892.94 467.38  0.52    -0.77 33.54
## Y5_Total_AccStandard         0.22    0.96   0.74  0.12    -1.31  0.06
## Y5_Incongruent_AccStandard   0.02    1.00   0.98 -1.37     0.35  0.08
## Y5_Neutral_AccStandard       0.30    1.00   0.70 -0.78    -1.20  0.07
## Y5_Congruent_AccStandard     0.75    1.00   0.25 -2.06     2.40  0.02
## Y5_Incongruent_RTStandard  471.50 1016.50 545.00  1.42     2.19 33.44
## Y5_Neutral_RTStandard      371.29  836.29 465.00  1.35     1.06 30.34
## Y5_Congruent_RTStandard    399.25  910.25 511.00  1.01     0.57 31.86
## Y5_Total_RTStandard        439.69  974.62 534.94  1.41     1.90 32.40
## Y3_Total_AccStandard         0.08    0.92   0.83  0.11    -1.65  0.08
## Y3_Incongruent_AccStandard   0.09    0.98   0.88 -0.81    -0.22  0.06
## Y3_Neutral_AccStandard*      0.67    1.00   0.33 -1.03    -0.75  0.03
## Y3_Congruent_AccStandard     0.17    0.95   0.78 -0.35    -1.15  0.06
## Y3_Incongruent_RTStandard  548.19  871.26 323.07 -0.47    -1.26 25.69
## Y3_Neutral_RTStandard      543.48  899.48 356.00 -0.07    -1.45 29.83
## Y3_Congruent_RTStandard    504.44  889.12 384.68  0.23    -1.54 32.81
## Y3_Total_RTStandard        515.51  874.02 358.52  0.13    -1.55 31.91
## -------------------------------------------------------- 
## group: Sport
##                            vars  n   mean     sd median trimmed    mad
## ID*                           1 15    NaN     NA     NA     NaN     NA
## Group*                        2 15   2.00   0.00   2.00    2.00   0.00
## Age                           3 15  81.40   3.11  82.00   81.31   2.97
## Gender*                       4 15   1.73   0.46   2.00    1.77   0.00
## WASI                          5 15 107.20  14.09 111.00  107.85  14.83
## Y4_Total_AccStandard          6 15   0.52   0.32   0.50    0.52   0.48
## Y4_Incongruent_AccStandard    7 15   0.54   0.29   0.67    0.54   0.30
## Y4_Neutral_AccStandard        8 15   0.52   0.32   0.66    0.53   0.23
## Y4_Congruent_AccStandard      9 15   0.81   0.07   0.80    0.81   0.10
## Y4_Incongruent_RTStandard    10 15 749.35 179.27 756.00  737.96 113.17
## Y4_Neutral_RTStandard        11 15 609.09 110.96 640.43  611.90  79.64
## Y4_Congruent_RTStandard      12 15 695.01 117.13 692.75  692.70  92.79
## Y4_Total_RTStandard          13 15 717.31 138.55 742.62  713.09  85.25
## Y5_Total_AccStandard         14 15   0.58   0.19   0.56    0.58   0.16
## Y5_Incongruent_AccStandard   15 15   0.52   0.28   0.52    0.52   0.28
## Y5_Neutral_AccStandard       16 15   0.73   0.35   0.89    0.76   0.16
## Y5_Congruent_AccStandard     17 15   0.64   0.21   0.70    0.64   0.14
## Y5_Incongruent_RTStandard    18 15 668.43 100.05 644.67  659.50  82.78
## Y5_Neutral_RTStandard        19 15 525.10  78.39 509.86  517.22  53.16
## Y5_Congruent_RTStandard      20 15 624.73 111.78 608.75  616.56  84.14
## Y5_Total_RTStandard          21 15 636.58  94.11 623.50  627.50  64.49
## Y3_Total_AccStandard         22 15   0.47   0.21   0.49    0.45   0.20
## Y3_Incongruent_AccStandard   23 15   0.46   0.27   0.45    0.45   0.29
## Y3_Neutral_AccStandard*      24 15   0.65   0.26   0.66    0.66   0.24
## Y3_Congruent_AccStandard     25 15   0.49   0.25   0.57    0.49   0.19
## Y3_Incongruent_RTStandard    26 15 740.88  91.40 739.97  739.37  99.22
## Y3_Neutral_RTStandard        27 15 771.80 109.39 795.94  779.54  99.85
## Y3_Congruent_RTStandard      28 15 689.54 139.12 757.00  688.63 158.24
## Y3_Total_RTStandard          29 15 660.70 122.91 623.46  655.80 140.95
##                               min     max  range  skew kurtosis    se
## ID*                           Inf    -Inf   -Inf    NA       NA    NA
## Group*                       2.00    2.00   0.00   NaN      NaN  0.00
## Age                         77.00   87.00  10.00  0.08    -1.53  0.80
## Gender*                      1.00    2.00   1.00 -0.95    -1.16  0.12
## WASI                        82.00  124.00  42.00 -0.59    -1.12  3.64
## Y4_Total_AccStandard         0.06    0.95   0.88 -0.09    -1.66  0.08
## Y4_Incongruent_AccStandard   0.12    0.96   0.84 -0.14    -1.71  0.08
## Y4_Neutral_AccStandard       0.01    0.96   0.95 -0.42    -1.54  0.08
## Y4_Congruent_AccStandard     0.71    0.90   0.19  0.01    -1.63  0.02
## Y4_Incongruent_RTStandard  423.50 1223.33 799.83  0.75     1.17 46.29
## Y4_Neutral_RTStandard      375.71  805.86 430.14 -0.37    -0.58 28.65
## Y4_Congruent_RTStandard    452.75  967.25 514.50  0.28     0.37 30.24
## Y4_Total_RTStandard        436.38 1053.12 616.75  0.30     0.60 35.77
## Y5_Total_AccStandard         0.26    0.89   0.63  0.10    -0.98  0.05
## Y5_Incongruent_AccStandard   0.02    0.97   0.95  0.03    -1.15  0.07
## Y5_Neutral_AccStandard       0.06    1.00   0.94 -0.82    -1.13  0.09
## Y5_Congruent_AccStandard     0.29    0.99   0.71 -0.21    -0.91  0.05
## Y5_Incongruent_RTStandard  562.33  890.67 328.33  0.90    -0.58 25.83
## Y5_Neutral_RTStandard      431.14  721.43 290.29  0.99     0.19 20.24
## Y5_Congruent_RTStandard    460.75  895.00 434.25  0.67     0.06 28.86
## Y5_Total_RTStandard        517.75  873.38 355.62  1.00     0.27 24.30
## Y3_Total_AccStandard         0.13    0.98   0.84  0.47    -0.15  0.06
## Y3_Incongruent_AccStandard   0.08    0.91   0.84  0.24    -1.29  0.07
## Y3_Neutral_AccStandard*      0.09    0.98   0.88 -0.75    -0.45  0.07
## Y3_Congruent_AccStandard     0.02    0.98   0.97 -0.22    -0.71  0.07
## Y3_Incongruent_RTStandard  602.82  898.65 295.83  0.08    -1.28 23.60
## Y3_Neutral_RTStandard      543.41  899.53 356.11 -0.65    -0.86 28.24
## Y3_Congruent_RTStandard    500.02  890.86 390.84 -0.11    -1.71 35.92
## Y3_Total_RTStandard        508.60  876.45 367.85  0.45    -1.22 31.73
## -------------------------------------------------------- 
## group: Control
##                            vars  n   mean     sd median trimmed    mad
## ID*                           1 16    NaN     NA     NA     NaN     NA
## Group*                        2 16   3.00   0.00   3.00    3.00   0.00
## Age                           3 16  82.56   2.76  82.00   82.50   2.97
## Gender*                       4 16   1.56   0.51   2.00    1.57   0.00
## WASI                          5 16  96.06  16.65  98.00   97.07  14.08
## Y4_Total_AccStandard          6 16   0.55   0.31   0.62    0.56   0.26
## Y4_Incongruent_AccStandard    7 16   0.45   0.28   0.40    0.44   0.28
## Y4_Neutral_AccStandard        8 16   0.33   0.23   0.27    0.32   0.21
## Y4_Congruent_AccStandard      9 16   0.89   0.08   0.88    0.89   0.10
## Y4_Incongruent_RTStandard    10 16 639.45 132.49 652.83  621.52  99.46
## Y4_Neutral_RTStandard        11 16 545.79 109.17 547.43  536.42 113.10
## Y4_Congruent_RTStandard      12 16 599.56  87.33 582.62  598.89  78.58
## Y4_Total_RTStandard          13 16 628.47 115.79 631.00  617.48 103.74
## Y5_Total_AccStandard         14 16   0.41   0.21   0.41    0.42   0.26
## Y5_Incongruent_AccStandard   15 16   0.63   0.16   0.64    0.62   0.11
## Y5_Neutral_AccStandard       16 16   0.99   0.04   1.00    1.00   0.00
## Y5_Congruent_AccStandard     17 16   0.44   0.24   0.46    0.44   0.27
## Y5_Incongruent_RTStandard    18 16 565.72  80.83 539.33  557.04  53.13
## Y5_Neutral_RTStandard        19 16 470.08  91.86 453.93  457.33  49.03
## Y5_Congruent_RTStandard      20 16 533.20  64.52 520.12  527.93  62.08
## Y5_Total_RTStandard          21 16 551.11  75.78 534.94  543.52  53.56
## Y3_Total_AccStandard         22 16   0.62   0.24   0.64    0.63   0.32
## Y3_Incongruent_AccStandard   23 16   0.45   0.30   0.47    0.44   0.29
## Y3_Neutral_AccStandard*      24 16   0.40   0.28   0.41    0.39   0.36
## Y3_Congruent_AccStandard     25 16   0.51   0.31   0.47    0.51   0.37
## Y3_Incongruent_RTStandard    26 16 724.16  99.53 723.89  727.07 111.46
## Y3_Neutral_RTStandard        27 16 693.05 119.40 666.95  691.52 120.32
## Y3_Congruent_RTStandard      28 16 667.32 105.34 657.52  667.11 132.64
## Y3_Total_RTStandard          29 16 683.15 105.32 701.66  679.91 139.51
##                               min     max  range  skew kurtosis    se
## ID*                           Inf    -Inf   -Inf    NA       NA    NA
## Group*                       3.00    3.00   0.00   NaN      NaN  0.00
## Age                         78.00   88.00  10.00  0.20    -1.03  0.69
## Gender*                      1.00    2.00   1.00 -0.23    -2.07  0.13
## WASI                        56.00  122.00  66.00 -0.56     0.04  4.16
## Y4_Total_AccStandard         0.00    0.92   0.92 -0.59    -1.16  0.08
## Y4_Incongruent_AccStandard   0.07    0.94   0.87  0.40    -1.29  0.07
## Y4_Neutral_AccStandard       0.03    0.78   0.75  0.46    -1.09  0.06
## Y4_Congruent_AccStandard     0.74    1.00   0.26 -0.24    -1.25  0.02
## Y4_Incongruent_RTStandard  487.50 1042.33 554.83  1.52     2.62 33.12
## Y4_Neutral_RTStandard      392.29  830.57 438.29  0.89     0.46 27.29
## Y4_Congruent_RTStandard    456.75  751.75 295.00  0.18    -1.15 21.83
## Y4_Total_RTStandard        468.62  942.19 473.56  0.98     0.85 28.95
## Y5_Total_AccStandard         0.01    0.71   0.70 -0.13    -1.17  0.05
## Y5_Incongruent_AccStandard   0.33    0.98   0.65 -0.01    -0.26  0.04
## Y5_Neutral_AccStandard       0.83    1.00   0.17 -3.28     9.36  0.01
## Y5_Congruent_AccStandard     0.01    0.81   0.81 -0.21    -1.21  0.06
## Y5_Incongruent_RTStandard  465.83  787.17 321.33  1.28     1.07 20.21
## Y5_Neutral_RTStandard      384.71  734.00 349.29  1.67     2.03 22.96
## Y5_Congruent_RTStandard    463.25  677.00 213.75  0.80    -0.46 16.13
## Y5_Total_RTStandard        466.62  741.81 275.19  1.21     0.54 18.95
## Y3_Total_AccStandard         0.23    0.94   0.71 -0.10    -1.44  0.06
## Y3_Incongruent_AccStandard   0.03    0.99   0.96  0.22    -1.02  0.08
## Y3_Neutral_AccStandard*      0.02    0.91   0.89  0.33    -1.06  0.07
## Y3_Congruent_AccStandard     0.05    1.00   0.95  0.18    -1.39  0.08
## Y3_Incongruent_RTStandard  548.36  859.27 310.91 -0.16    -1.40 24.88
## Y3_Neutral_RTStandard      510.27  897.27 387.00  0.34    -1.18 29.85
## Y3_Congruent_RTStandard    513.41  824.12 310.71  0.03    -1.64 26.34
## Y3_Total_RTStandard        555.65  856.01 300.36  0.18    -1.56 26.33

Assumption testing

DISCLAIMER: Basically all of basic statistics relies on a set of assumptions, namely that your data is normally distributed and has homogeniety of variance. It would be really lovely if we could just test these assumptions, make sure the tests don’t fail, and then move on to using our classic tests. However, that is not how this works. The shapiro test is good at telling you when your data is NOT normal (you can trust that one), but just because it doesn’t fail doesn’t mean that you can accept that it is normal. Basically, even if you test for things, you never really know. So, please use classic methods with caution.

Normality and Skewness

## [1] 0.06103718

What is this??

Skewness is a measure of symmetry, basically. Negative skewness = mean < median (left skew) Positive skewness = mean > median (right skew)

When you should worry: >1.96 or < -1.96

A perfectly non-skewed data is where the mean = median, skewness = 0.

Kurtosis

## [1] -0.9274062

What is this??

Kurtosis is similar to skewness, in that it measures the data’s distriution. It measures the tail shape of the data distribution/ how “peaked” the distribution is.

A thin-tailed data distribution has too much data in the middle. A fat-tailed data distribution has too much data on the tails.

A perfect normal distribution has 0 excess kurtosis.

The acceptable range for this is similar to skewness, above.

Test for normality

## 
##  Shapiro-Wilk normality test
## 
## data:  flanker$Y4_Total_AccStandard
## W = 0.8859, p-value = 0.0001698

if it fails (nonnormal), p is less than 0.05.

How to report

“Age was normally distributed with skewness of 0.06 and kurtosis of -0.93.”

Categorical Data Analysis

Categorical data is data that is not continuous. These include variables like Gender, or things that have a yes or no answer. We can look at this data in many ways, but here is one. This test will tell us if there are gender differences between groups.

Chi-Square

## 
##  Pearson's Chi-squared test
## 
## data:  flanker_long$Group and flanker_long$Gender
## X-squared = 36.273, df = 2, p-value = 1.328e-08

What is this?? This tests if 2 categorical variables are related in some population.

If the p value is less than p < 0.05, we reject the null hypothesis that the distribution of the variables are independent. Basically, the groups differ in gender distribution.

How to report

“An association between group and gender was observed, χ2(2) = 36.27, p < 0.01.”

2 independent groups

T test

## 
##  Welch Two Sample t-test
## 
## data:  music$Y4_Total_AccStandard and control$Y4_Total_AccStandard
## t = 0.17446, df = 32.984, p-value = 0.8626
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2121033  0.2518894
## sample estimates:
## mean of x mean of y 
## 0.5714368 0.5515437

Wilcox (nonparametric)

This is the classic non-parametric alternative to the t test. It is HORRIBLE if you have tied values. So watch out for that.

## 
##  Wilcoxon rank sum test
## 
## data:  music$Y4_Total_AccStandard and control$Y4_Total_AccStandard
## W = 171, p-value = 0.5448
## alternative hypothesis: true location shift is not equal to 0

Cliff’s (robust)

This is a robust alternative to the t test and it performs well even with tied values. It tests three different hypotheses (equal, greater, less than).

## $n1
## [1] 19
## 
## $n2
## [1] 16
## 
## $cl
## [1] -0.260875
## 
## $cu
## [1] 0.4764344
## 
## $d
## [1] 0.125
## 
## $sqse.d
## [1] 0.04094187
## 
## $phat
## [1] 0.4375
## 
## $summary.dvals
##      P(X<Y) P(X=Y) P(X>Y)
## [1,] 0.4375      0 0.5625
## 
## $ci.p
## [1] 0.2617828 0.6304375

This plot shows the distribution of D, wehre D is the difference between a randomly sampled observation from group1-randomly sampled observtion from group2. D will be symmetric around 0 when distributions are identical.

## $n1
## [1] 19
## 
## $n2
## [1] 16
## 
## $d.hat
## [1] 0.125
## 
## $d.ci
## [1] -0.2608750  0.4764344
## 
## $p.value
## [1] 0.54
## 
## $p.hat
## [1] 0.4375
## 
## $p.ci
## [1] 0.2617828 0.6304375
## 
## $summary.dvals
##      P(X<Y) P(X=Y) P(X>Y)
## [1,] 0.4375      0 0.5625

More than 2 independent groups

1 Way ANOVA

This works for Wide

##             Df Sum Sq Mean Sq F value Pr(>F)
## Group        2  0.010 0.00487   0.043  0.958
## Residuals   44  4.944 0.11236

But also for long

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Year          2   2.10  1.0489   11.85 9.83e-06 ***
## Residuals   420  37.17  0.0885                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Between/within

2-Way ANOVA

But lets say you wanted to look at 2 variables. You might want to do a 2way anova.

## Warning in aov(AccStandard ~ Condition * Group + Error(ID/(Condition *
## Group)), : Error() model is singular
## 
## Error: ID
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## Group      2  1.821  0.9106   10.96 0.000138 ***
## Residuals 44  3.657  0.0831                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: ID:Condition
##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## Condition        2  1.423  0.7115  15.113 2.28e-06 ***
## Condition:Group  4  0.238  0.0594   1.262    0.291    
## Residuals       88  4.143  0.0471                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Within
##            Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 282  27.98 0.09924

With aCovariate

If, for example, you had group differences in Age, you might want to add it in as a co-variate.

Just change that * to a +

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Condition     2   1.42  0.7115   7.885 0.000435 ***
## Age           1   0.04  0.0367   0.407 0.524081    
## Residuals   419  37.81  0.0902                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Bootstrap methods (ROBUST!)

see page 441 of Rand’s book. (11.6.3) please read to understand how boostrapping works. This is probably the most robust way to do a between/within analysis. You can also just do a single within or single between.

Data setup

## [1] "Group Levels:"
##       [,1] [,2]
##  [1,]    1    1
##  [2,]    1    2
##  [3,]    1    3
##  [4,]    2    1
##  [5,]    2    2
##  [6,]    2    3
##  [7,]    3    1
##  [8,]    3    2
##  [9,]    3    3
## Error in formula.default(object, env = baseenv()): invalid formula
## Error in formula.default(object, env = baseenv()): invalid formula
## Error in formula.default(object, env = baseenv()): invalid formula

Effect Size Calculators

When reporting statistics in papers, you need to include eta squared, or effect size. These values do not immediately appear with this call. So, you have to do this:

##                eta.sq eta.sq.part
## Year       0.05342401  0.05838327
## Group      0.04638160  0.05108024
## Year:Group 0.03856145  0.04283681

Because, especially older papers, sometimes dont report it! Here is how to calculate it.

Hedge’s g

kind of like cohen’s d, but reduces bias

from mean and sd

## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: mean and sd to effect size Hedges' g
##     Effect Size:  -0.3768
##  Standard Error:   0.3528
##        Variance:   0.1245
##        Lower CI:  -1.0683
##        Upper CI:   0.3147
##          Weight:   8.0333

from 1way anova

## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: F-value (one-way-Anova) to effect size Hedges' g
##     Effect Size:   0.8603
##  Standard Error:   0.4111
##        Variance:   0.1690
##        Lower CI:   0.0545
##        Upper CI:   1.6661
##          Weight:   5.9163

from mean and SE

## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: mean and se to effect size Hedges' g
##     Effect Size:  -0.1998
##  Standard Error:   0.1920
##        Variance:   0.0369
##        Lower CI:  -0.5760
##        Upper CI:   0.1765
##          Weight:  27.1366

from regression

Unstandardarized

## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: unstandardized regression coefficient to effect size Hedges' g
##     Effect Size:   0.6941
##  Standard Error:   0.1328
##        Variance:   0.0176
##        Lower CI:   0.4338
##        Upper CI:   0.9544
##          Weight:  56.7018

Standardized

## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: standardized regression coefficient to effect size Hedges' g
##     Effect Size:   1.9868
##  Standard Error:   0.1569
##        Variance:   0.0246
##        Lower CI:   1.6793
##        Upper CI:   2.2942
##          Weight:  40.6353

from chi sq

## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: chi-squared-value to effect size Cox odds ratios
##     Effect Size:   2.9858
##  Standard Error:   0.3478
##        Variance:   0.1210
##        Lower CI:   1.5101
##        Upper CI:   5.9036
##          Weight:   8.2667

from correlation

## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: point-biserial r to effect size Hedges' g
##     Effect Size:   0.5170
##  Standard Error:   0.1380
##        Variance:   0.0190
##        Lower CI:   0.2465
##        Upper CI:   0.7875
##          Weight:  52.4967

from indep. t test

## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: t-value to effect size Hedges' g
##     Effect Size:   0.3806
##  Standard Error:   0.3516
##        Variance:   0.1236
##        Lower CI:  -0.3085
##        Upper CI:   1.0697
##          Weight:   8.0887

from cohens d

## [1] 1.45856

from regression

unstandardized coef

## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: unstandardized regression coefficient to effect size Hedges' g
##     Effect Size:   0.6941
##  Standard Error:   0.1328
##        Variance:   0.0176
##        Lower CI:   0.4338
##        Upper CI:   0.9544
##          Weight:  56.7018

standardized coef

## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: standardized regression coefficient to effect size Hedges' g
##     Effect Size:  -0.5773
##  Standard Error:   0.1606
##        Variance:   0.0258
##        Lower CI:  -0.8920
##        Upper CI:  -0.2626
##          Weight:  38.7879

Post Hocs, using AOV

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = AccStandard ~ Year, data = flanker_long)
## 
## $Year
##              diff         lwr        upr     p adj
## Y4-Y3 0.008040865 -0.07529259 0.09137432 0.9720074
## Y5-Y3 0.153246940  0.06991348 0.23658040 0.0000565
## Y5-Y4 0.145206075  0.06187262 0.22853953 0.0001475

What is all of this??

We use ANOVA to determine if there are any statistically significant differences between groups on a given dependent variable.

The null hypothesis states that the groups are all the same, and that the means are not statistically different from each other.

If a significant difference is found, it is called a main effect, and must then be interpreted using a post-hoc test. It spits out something called an F ratio, which is F = variation BETWEEN samples/ variation WITHIN samples. The null hypothesis says that the F ratio is 1.

So, if it is greater than 1, it means that the variation BETWEEN your sample is greater than the variation WITHIN your sample. This means that there is something acting upon your data that makes a change in it that is OUTSIDE of the variation the comes from within individuals.

Post hocs let you know where that main effect is coming from. They are a posteriori tests, which mean that they are performed after the ANOVA. Performing them before the ANOVA will increase your chances of obtaining a Type 1 error (false positive), and is generally considered poor practice.

However, there is also an argument that states that using any kind of post hoc AFTER doing your global test(like an f test) completely changes the properties of the post hoc… and you’re not really testing what you think you’re testing. So, things to think about.

How to report

“A 1-Way ANOVA revealed a main effect of Year on accuracy, F(2,420) = 11.85, p < 0.001,η2 = 0.05. Post-hoc analysis using Tukey’s HSD indicated that participants were more accurate Year 5 than Year 3 (p < 0.001), and Year 4 (p < 0.001).”

RM ANOVA

Looking across years, it is helpful to do a repeated measures ANOVA

It will likely tell you your data is unbalanced and that is just how it is!!! who can have perfect retention, really?

Now, see the results.

## $ANOVA
##                 Effect DFn DFd         F            p p<.05        ges
## 2                Group   2  44 10.955254 1.377071e-04     * 0.07218744
## 3                 Year   2  88 16.487581 8.290947e-07     * 0.08224664
## 5            Condition   2  88 15.113204 2.279290e-06     * 0.05730631
## 4           Group:Year   4  88  5.950368 2.761895e-04     * 0.06075578
## 6      Group:Condition   4  88  1.262064 2.910140e-01       0.01005078
## 7       Year:Condition   4 176 24.916370 2.331043e-16     * 0.19494460
## 8 Group:Year:Condition   8 176  6.805202 8.963359e-08     * 0.11682089
## 
## $`Mauchly's Test for Sphericity`
##                 Effect         W          p p<.05
## 3                 Year 0.9883816 0.77782208      
## 4           Group:Year 0.9883816 0.77782208      
## 5            Condition 0.9909077 0.82170173      
## 6      Group:Condition 0.9909077 0.82170173      
## 7       Year:Condition 0.6274167 0.01946062     *
## 8 Group:Year:Condition 0.6274167 0.01946062     *
## 
## $`Sphericity Corrections`
##                 Effect       GGe        p[GG] p[GG]<.05       HFe
## 3                 Year 0.9885151 9.393954e-07         * 1.0347479
## 4           Group:Year 0.9885151 2.951186e-04         * 1.0347479
## 5            Condition 0.9909897 2.492466e-06         * 1.0375199
## 6      Group:Condition 0.9909897 2.911876e-01           1.0375199
## 7       Year:Condition 0.8507300 2.937179e-14         * 0.9306791
## 8 Group:Year:Condition 0.8507300 6.724802e-07         * 0.9306791
##          p[HF] p[HF]<.05
## 3 8.290947e-07         *
## 4 2.761895e-04         *
## 5 2.279290e-06         *
## 6 2.910140e-01          
## 7 2.200106e-15         *
## 8 2.282444e-07         *

What does this mean??

Like a 1-way ANOVA, a RM ANOVA looks at related but not indpedendent groups. It allows us to look at groups over time.

Here, you can look at an an interaction effect as well as any main effects. Interaction effects show how two things are related as a function of some other thing!

How to report

"A repeated-measures ANOVA revealed a significant main effect of ___(see above). A significant Year*Group interaction effect was also observed, F(4,88) = 5.95, p< 0.001,η2 = 0.06. Follow-up analyses indicated that the music group had significantly higher accuracy than the control group, only at Year 5 (p< 0.05)."

Sphericity Assumptions

Look! above, it shows you the sphericity assumptions AND the effect sizes. How great is that.

What are sphericity assumptions? Sphericity is the assumption of a RM ANOVA, where the variances of the differences between all the possible pairs of a within subjects condition are equal. For example, the variances of Y3 and Y4, and Incongruent and Congruent are equal. When violated, you can get an F-ratio that is inflated.

Mauchly’s test of Sphericity tells you whether or not the assumptions are violated, by giving a p value. When the probability of Mauchly’s test is < 0.05, the assumption has been violated. Therefore, we can conclude that there are *significant differences between the variances of differences.

This test is required when you have more than 2 levels of a given within subjects condition. Here, we have 3 years and 3 conditions, so we definitely should look at it.

GGe = greenhouse-geisser epsilon (what you report below) pGG = Adjusted p value

you also need the corrected degrees of freedom. Calculate that by:

df(n) x GGe, df(d) x GGe

## [1] 3.40292
## [1] 74.86424

When it is violated, you need to correct the degrees of freedom.

How to report

“Mauchly’s test indicated that the assumption of sphericity had been violated for the Year by Condition interaction, therefore degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (ε = 0.85), F(3.4,74.86) = 6.81, p < 0.001.”

People sometimes report the chi-squared but, tbh I dont know where that comes from and I’ve rarely seen it. Sometimes people don’t even put anything, they just say “Sphericity corrections using G-G were used when appropriate”, and its assumed that if you have a weird-ass DF, that is the reason why.

Simple Effects

What do the main effects mean??

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lme.formula(fixed = AccStandard ~ Year, data = flanker_long, 
##     random = ~1 | ID, method = "ML")
## 
## Linear Hypotheses:
##              Estimate Std. Error z value Pr(>|z|)    
## Y4 - Y3 == 0 0.008041   0.034576   0.233    0.971    
## Y5 - Y3 == 0 0.153247   0.034576   4.432   <1e-04 ***
## Y5 - Y4 == 0 0.145206   0.034576   4.200   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Ok, but what do all of those interaction effects mean? Let us explore with a simple effects test. Let’s just pretend, first that we had a YEAR BY GROUP interaction.

First, split up into separate data sets.

Then, run ANOVAs on group within each year

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Group         2  1.295  0.6477   8.352 0.000377 ***
## Residuals   138 10.702  0.0775                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##              Df Sum Sq Mean Sq F value Pr(>F)
## Group         2  0.123 0.06171   0.662  0.517
## Residuals   138 12.865 0.09322
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Group         2  1.917  0.9584   12.88 7.43e-06 ***
## Residuals   138 10.267  0.0744                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Do a post of to find any sig differences between groups in each year

## 
## Study: Y5Out ~ "Group"
## 
## HSD Test for AccStandard 
## 
## Mean Square Error:  0.07754781 
## 
## Group,  means
## 
##         AccStandard       std  r    Min Max
## Control   0.6854810 0.2850832 48 0.0088   1
## Music     0.8555076 0.2571079 48 0.0162   1
## Sport     0.6303778 0.2929041 45 0.0244   1
## 
## Alpha: 0.05 ; DF Error: 138 
## Critical Value of Studentized Range: 3.350657 
## 
## Comparison between treatments means
## 
##                  difference pvalue signif.         LCL         UCL
## Control - Music -0.17002660 0.0092      ** -0.30470381 -0.03534939
## Control - Sport  0.05510326 0.6074         -0.08180017  0.19200670
## Music - Sport    0.22512986 0.0004     ***  0.08822643  0.36203329
## 
## Study: Y4Out ~ "Group"
## 
## HSD Test for AccStandard 
## 
## Mean Square Error:  0.09322311 
## 
## Group,  means
## 
##         AccStandard       std  r    Min    Max
## Control   0.5568437 0.3226933 48 0.0309 0.9955
## Music     0.5639854 0.3105934 48 0.0131 0.9787
## Sport     0.6235689 0.2794730 45 0.0091 0.9637
## 
## Alpha: 0.05 ; DF Error: 138 
## Critical Value of Studentized Range: 3.350657 
## 
## Comparison between treatments means
## 
##                   difference pvalue signif.        LCL        UCL
## Control - Music -0.007141667 0.9928         -0.1548045 0.14052115
## Control - Sport -0.066725139 0.5447         -0.2168288 0.08337856
## Music - Sport   -0.059583472 0.6157         -0.2096872 0.09052022
## 
## Study: Y3Out ~ "Group"
## 
## HSD Test for AccStandard 
## 
## Mean Square Error:  0.07440126 
## 
## Group,  means
## 
##         AccStandard       std  r        Min    Max
## Control   0.4537827 0.2955247 48 0.01727143 0.9988
## Music     0.7285660 0.2513071 48 0.09480000 1.0000
## Sport     0.5327533 0.2694414 45 0.01520000 0.9807
## 
## Alpha: 0.05 ; DF Error: 138 
## Critical Value of Studentized Range: 3.350657 
## 
## Comparison between treatments means
## 
##                 difference pvalue signif.         LCL         UCL
## Control - Music -0.2747832 0.0000     *** -0.40669985 -0.14286662
## Control - Sport -0.0789706 0.3462         -0.21306780  0.05512661
## Music - Sport    0.1958126 0.0021      **  0.06171543  0.32990984

Ok, now let’s actually look at our pretend YEAR * GROUP * CONDITION interaction

So keep those subsets.

Run a RM anova for group * condition within each year

## Warning: Converting "ID" to factor for ANOVA.
## Warning: You have removed one or more levels from variable "Condition".
## Refactoring for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified
## a well-considered value for the type argument to ezANOVA().
## $ANOVA
##            Effect DFn DFd         F            p p<.05        ges
## 2           Group   2  44 11.630578 8.813404e-05     * 0.17560790
## 3       Condition   2  88  3.890085 2.404979e-02     * 0.05014061
## 4 Group:Condition   4  88  3.253038 1.544515e-02     * 0.08112364
## 
## $`Mauchly's Test for Sphericity`
##            Effect         W         p p<.05
## 3       Condition 0.9632839 0.4474218      
## 4 Group:Condition 0.9632839 0.4474218      
## 
## $`Sphericity Corrections`
##            Effect       GGe      p[GG] p[GG]<.05      HFe      p[HF]
## 3       Condition 0.9645842 0.02550448         * 1.007974 0.02404979
## 4 Group:Condition 0.9645842 0.01671684         * 1.007974 0.01544515
##   p[HF]<.05
## 3         *
## 4         *
## Warning: Converting "ID" to factor for ANOVA.
## Warning: You have removed one or more levels from variable "Condition".
## Refactoring for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified
## a well-considered value for the type argument to ezANOVA().
## $ANOVA
##            Effect DFn DFd         F            p p<.05        ges
## 2           Group   2  44  1.099999 3.418501e-01       0.01763845
## 3       Condition   2  88 56.175094 1.898840e-16     * 0.45001672
## 4 Group:Condition   4  88  1.828751 1.304065e-01       0.05057993
## 
## $`Mauchly's Test for Sphericity`
##            Effect         W          p p<.05
## 3       Condition 0.7328087 0.00125112     *
## 4 Group:Condition 0.7328087 0.00125112     *
## 
## $`Sphericity Corrections`
##            Effect       GGe        p[GG] p[GG]<.05       HFe        p[HF]
## 3       Condition 0.7891469 1.769696e-13         * 0.8135366 8.017251e-14
## 4 Group:Condition 0.7891469 1.471664e-01           0.8135366 1.451299e-01
##   p[HF]<.05
## 3         *
## 4
## Warning: Converting "ID" to factor for ANOVA.
## Warning: You have removed one or more levels from variable "Condition".
## Refactoring for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified
## a well-considered value for the type argument to ezANOVA().
## $ANOVA
##            Effect DFn DFd         F            p p<.05       ges
## 2           Group   2  44  9.013245 5.239145e-04     * 0.1466672
## 3       Condition   2  88  9.975428 1.245345e-04     * 0.1162974
## 4 Group:Condition   4  88 10.929622 3.062081e-07     * 0.2238323
## 
## $`Mauchly's Test for Sphericity`
##            Effect         W         p p<.05
## 3       Condition 0.9902836 0.8106449      
## 4 Group:Condition 0.9902836 0.8106449      
## 
## $`Sphericity Corrections`
##            Effect       GGe        p[GG] p[GG]<.05      HFe        p[HF]
## 3       Condition 0.9903771 1.322129e-04         * 1.036834 1.245345e-04
## 4 Group:Condition 0.9903771 3.441497e-07         * 1.036834 3.062081e-07
##   p[HF]<.05
## 3         *
## 4         *

So it looks like it’s not a group * condition effect, based on year. Lets split it up based on condition.

Now run the RM Anova for GROUP* YEAR within each condition

## Warning: Converting "ID" to factor for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified
## a well-considered value for the type argument to ezANOVA().
## $ANOVA
##       Effect DFn DFd         F            p p<.05       ges
## 2      Group   2  44 13.002999 3.655807e-05     * 0.1898688
## 3       Year   2  88 33.719352 1.344728e-11     * 0.3162242
## 4 Group:Year   4  88  9.305931 2.556283e-06     * 0.2033557
## 
## $`Mauchly's Test for Sphericity`
##       Effect         W          p p<.05
## 3       Year 0.8723394 0.05305636      
## 4 Group:Year 0.8723394 0.05305636      
## 
## $`Sphericity Corrections`
##       Effect       GGe        p[GG] p[GG]<.05       HFe        p[HF]
## 3       Year 0.8867916 1.530853e-10         * 0.9213574 7.281456e-11
## 4 Group:Year 0.8867916 8.059822e-06         * 0.9213574 5.673412e-06
##   p[HF]<.05
## 3         *
## 4         *

and you continue until you figure it out!

Post Hocs

We did this above, but here it is again. Y5out is the name of an AOV

## 
## Study: Y5Out ~ "Group"
## 
## HSD Test for AccStandard 
## 
## Mean Square Error:  0.07754781 
## 
## Group,  means
## 
##         AccStandard       std  r    Min Max
## Control   0.6854810 0.2850832 48 0.0088   1
## Music     0.8555076 0.2571079 48 0.0162   1
## Sport     0.6303778 0.2929041 45 0.0244   1
## 
## Alpha: 0.05 ; DF Error: 138 
## Critical Value of Studentized Range: 3.350657 
## 
## Comparison between treatments means
## 
##                  difference pvalue signif.         LCL         UCL
## Control - Music -0.17002660 0.0092      ** -0.30470381 -0.03534939
## Control - Sport  0.05510326 0.6074         -0.08180017  0.19200670
## Music - Sport    0.22512986 0.0004     ***  0.08822643  0.36203329

You can also do a simple t.test

## 
##  Welch Two Sample t-test
## 
## data:  Y3$AccStandard and Y4$AccStandard
## t = -0.22517, df = 279.71, p-value = 0.822
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.07833500  0.06225327
## sample estimates:
## mean of x mean of y 
## 0.5725293 0.5805702

Linear Growth Model

This way of looking at the data is good because it will let you look at people who don’t have data for all years. It requires a very specific format of data (kind of like our easy long form transform thingy above)

We are going to use a NEW data set. It has LOTS of missing datapoints, and isn’t in the best format, so you can see how we fix that!

Let’s check missing data rates.

##   subject_id            group           age_1           age_2       
##  Length:66          Min.   :1.000   Min.   :77.00   Min.   : 88.00  
##  Class :character   1st Qu.:1.000   1st Qu.:79.00   1st Qu.: 92.00  
##  Mode  :character   Median :2.000   Median :82.00   Median : 94.00  
##                     Mean   :2.046   Mean   :82.43   Mean   : 94.34  
##                     3rd Qu.:3.000   3rd Qu.:85.00   3rd Qu.: 96.00  
##                     Max.   :3.000   Max.   :88.00   Max.   :101.00  
##                     NA's   :1       NA's   :1       NA's   :1       
##      age_3           age_4           age_5        backwards_1   
##  Min.   : 99.0   Min.   :110.0   Min.   :121.0   Min.   :1.000  
##  1st Qu.:104.0   1st Qu.:116.0   1st Qu.:128.0   1st Qu.:2.000  
##  Median :107.0   Median :119.0   Median :131.0   Median :3.000  
##  Mean   :106.3   Mean   :118.4   Mean   :130.6   Mean   :3.492  
##  3rd Qu.:109.0   3rd Qu.:121.0   3rd Qu.:133.0   3rd Qu.:5.000  
##  Max.   :113.0   Max.   :126.0   Max.   :138.0   Max.   :6.000  
##  NA's   :1       NA's   :1       NA's   :1       NA's   :1      
##   backwards_2     backwards_3    backwards_4     backwards_5   
##  Min.   :2.000   Min.   :3.00   Min.   :3.000   Min.   :3.000  
##  1st Qu.:3.000   1st Qu.:3.00   1st Qu.:3.000   1st Qu.:4.000  
##  Median :4.000   Median :4.00   Median :5.000   Median :5.000  
##  Mean   :4.031   Mean   :4.46   Mean   :4.508   Mean   :4.627  
##  3rd Qu.:5.000   3rd Qu.:6.00   3rd Qu.:5.000   3rd Qu.:5.000  
##  Max.   :6.000   Max.   :6.00   Max.   :6.000   Max.   :6.000  
##  NA's   :1       NA's   :3      NA's   :5       NA's   :7      
##    forward_1        forward_2       forward_3      forward_4     
##  Min.   : 4.000   Min.   : 5.00   Min.   : 5.0   Min.   : 5.000  
##  1st Qu.: 5.000   1st Qu.: 6.00   1st Qu.: 7.0   1st Qu.: 7.000  
##  Median : 7.000   Median : 8.00   Median : 9.0   Median :10.000  
##  Mean   : 7.033   Mean   : 7.80   Mean   : 9.1   Mean   : 9.441  
##  3rd Qu.: 9.000   3rd Qu.: 9.25   3rd Qu.:11.0   3rd Qu.:12.000  
##  Max.   :10.000   Max.   :10.00   Max.   :13.0   Max.   :13.000  
##  NA's   :6        NA's   :6       NA's   :6      NA's   :7       
##    forward_5     
##  Min.   : 5.000  
##  1st Qu.: 7.000  
##  Median : 9.000  
##  Mean   : 8.966  
##  3rd Qu.:11.000  
##  Max.   :13.000  
##  NA's   :8

Put it in longform, using our fancy method.

Here, we added 2 extra variables to help with interpretation of our model coefficients.

cond -> we reordered our groups such that control was the first group, followed by music and sport. Why? When we do our models, this reordering will make the intercept (\(\beta_0\)) correspond to the control group. Since group is a factor with 3 levels, when doing linear models, you give each factor level a dummy variable. By doing this, the level with no dummy variable is control and is now our baseline. In this way, the coefficients (\(\beta_1\)) associated with sport and music are the changes in your variable of interest from control. See below for examples.
time0 -> We subtracted 1 from our time variable (the year of the study, eg. Year 1 = 1, Year 2 = 2), so that Year 1 would now be year 0. We did this so that the intercept would be the mean at time 0.

By changing these 2 variables, we we make it easier to interpret the results of the linear growth models. These changes make the intercept coefficient reflect the estimated outcome score (mean) for the contol group at Year 1. Now our intercept is more meaningful and the results are easier to interpret.

Distributions

## Models

Forwards, by time

## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: forward ~ cond * time0 + (time0 | subject_id)
##    Data: digit_span_long
## 
## REML criterion at convergence: 1348.3
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.81212 -0.85073  0.03932  0.74603  2.30620 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev. Corr 
##  subject_id (Intercept) 0.1567   0.3959        
##             time0       0.2115   0.4599   -1.00
##  Residual               4.9054   2.2148        
## Number of obs: 297, groups:  subject_id, 62
## 
## Fixed effects:
##                  Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)       7.36955    0.38582 151.21853  19.101  < 2e-16 ***
## condmusic        -0.04576    0.55924 150.92087  -0.082  0.93489    
## condsport         0.01388    0.55030 155.36021   0.025  0.97991    
## time0             0.51847    0.18658  68.46488   2.779  0.00704 ** 
## condmusic:time0   0.01037    0.26926  67.55687   0.039  0.96938    
## condsport:time0   0.10759    0.26209  69.42060   0.410  0.68271    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cndmsc cndspr time0  cndm:0
## condmusic   -0.690                            
## condsport   -0.701  0.484                     
## time0       -0.787  0.543  0.552              
## condmsc:tm0  0.545 -0.788 -0.382 -0.693       
## cndsprt:tm0  0.560 -0.386 -0.788 -0.712  0.493
## convergence code: 0
## boundary (singular) fit: see ?isSingular

Intercept is the average forwards score for the control group at Year 1. (see above for explanation on how we made it so the intercept would be interpreted this way).

At Year 1, the difference between the control mean and the music mean is -0.05 from the intercept (7.37). so the music mean at year 1 is round(control_coeff, 2) + -0.05 = 7.324

Forwards, by age

## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: forward ~ cond * age_in_yr5 + (age_in_yr5 | subject_id)
##    Data: digit_span_long
## 
## REML criterion at convergence: 1349.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.86334 -0.83188  0.01543  0.73994  2.32325 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev. Corr 
##  subject_id (Intercept) 1.4645   1.2102        
##             age_in_yr5  0.1921   0.4382   -1.00
##  Residual               4.9671   2.2287        
## Number of obs: 297, groups:  subject_id, 62
## 
## Fixed effects:
##                        Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)           6.3977530  0.6792200 88.9590207   9.419 5.11e-15 ***
## condmusic            -0.0551157  0.9906320 88.9388735  -0.056  0.95576    
## condsport             0.0046551  0.9638540 91.7058521   0.005  0.99616    
## age_in_yr5            0.5173504  0.1812443 66.4550439   2.854  0.00574 ** 
## condmusic:age_in_yr5 -0.0005214  0.2631795 66.6541075  -0.002  0.99843    
## condsport:age_in_yr5  0.0637871  0.2553870 67.7088557   0.250  0.80352    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cndmsc cndspr ag_n_5 cndm:__5
## condmusic   -0.686                              
## condsport   -0.705  0.483                       
## age_in_yr5  -0.936  0.642  0.660                
## cndmsc:g__5  0.645 -0.938 -0.454 -0.689         
## cndsprt:__5  0.665 -0.456 -0.936 -0.710  0.489  
## convergence code: 0
## boundary (singular) fit: see ?isSingular

Backwards, by time

## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: backwards ~ cond * time0 + (time0 | subject_id)
##    Data: digit_span_long
## 
## REML criterion at convergence: 1081.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.06994 -0.90131  0.06774  0.79287  1.76036 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev. Corr 
##  subject_id (Intercept) 0.22128  0.4704        
##             time0       0.02751  0.1659   -1.00
##  Residual               1.72844  1.3147        
## Number of obs: 313, groups:  subject_id, 65
## 
## Fixed effects:
##                 Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)      3.80161    0.23509 62.93213  16.171   <2e-16 ***
## condmusic       -0.08691    0.34430 62.62945  -0.252   0.8015    
## condsport       -0.30973    0.33582 62.64078  -0.922   0.3599    
## time0            0.20629    0.09584 88.45981   2.152   0.0341 *  
## condmusic:time0  0.05640    0.13957 86.65475   0.404   0.6871    
## condsport:time0  0.15908    0.13653 87.43059   1.165   0.2471    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cndmsc cndspr time0  cndm:0
## condmusic   -0.683                            
## condsport   -0.700  0.478                     
## time0       -0.837  0.571  0.586              
## condmsc:tm0  0.575 -0.837 -0.402 -0.687       
## cndsprt:tm0  0.587 -0.401 -0.838 -0.702  0.482
## convergence code: 0
## boundary (singular) fit: see ?isSingular

Backwards, by age

## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: backwards ~ cond * age_in_yr5 + (age_in_yr5 | subject_id)
##    Data: digit_span_long
## 
## REML criterion at convergence: 1081.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.1004 -0.8688  0.0534  0.8110  1.8042 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev. Corr 
##  subject_id (Intercept) 0.61809  0.7862        
##             age_in_yr5  0.02715  0.1648   -1.00
##  Residual               1.72753  1.3144        
## Number of obs: 313, groups:  subject_id, 65
## 
## Fixed effects:
##                      Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)           3.39794    0.39093 64.93891   8.692 1.74e-12 ***
## condmusic            -0.15422    0.57206 64.77860  -0.270   0.7883    
## condsport            -0.53150    0.55745 63.95860  -0.953   0.3440    
## age_in_yr5            0.20946    0.09326 88.18342   2.246   0.0272 *  
## condmusic:age_in_yr5  0.04631    0.13628 87.71905   0.340   0.7348    
## condsport:age_in_yr5  0.14042    0.13324 87.98418   1.054   0.2948    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cndmsc cndspr ag_n_5 cndm:__5
## condmusic   -0.683                              
## condsport   -0.701  0.479                       
## age_in_yr5  -0.944  0.645  0.662                
## cndmsc:g__5  0.646 -0.944 -0.453 -0.684         
## cndsprt:__5  0.661 -0.451 -0.944 -0.700  0.479  
## convergence code: 0
## boundary (singular) fit: see ?isSingular

Results

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN” “http://www.w3.org/TR/html4/loose.dtd”>
Results of Growth Modeling
Forwards by time Forwards by age Backwards by time Backwards by age
(Intercept) 7.37*** 6.40*** 3.80*** 3.40***
(0.39) (0.68) (0.24) (0.39)
condmusic -0.05 -0.06 -0.09 -0.15
(0.56) (0.99) (0.34) (0.57)
condsport 0.01 0.00 -0.31 -0.53
(0.55) (0.96) (0.34) (0.56)
time0 0.52** 0.21*
(0.19) (0.10)
condmusic:time0 0.01 0.06
(0.27) (0.14)
condsport:time0 0.11 0.16
(0.26) (0.14)
age_in_yr5 0.52** 0.21*
(0.18) (0.09)
condmusic:age_in_yr5 -0.00 0.05
(0.26) (0.14)
condsport:age_in_yr5 0.06 0.14
(0.26) (0.13)
AIC 1368.34 1369.59 1101.10 1101.77
BIC 1405.27 1406.53 1138.56 1139.23
Log Likelihood -674.17 -674.79 -540.55 -540.88
Num. obs. 297 297 313 313
Num. groups: subject_id 62 62 65 65
Var: subject_id (Intercept) 0.16 1.46 0.22 0.62
Var: subject_id time0 0.21 0.03
Cov: subject_id (Intercept) time0 -0.18 -0.08
Var: Residual 4.91 4.97 1.73 1.73
Var: subject_id age_in_yr5 0.19 0.03
Cov: subject_id (Intercept) age_in_yr5 -0.53 -0.13
p < 0.001, p < 0.01, p < 0.05

But what does it mean??? This is a nice table that puts all 4 models above together.

Correlations

Simple

A simple correlation

## 
##  Pearson's product-moment correlation
## 
## data:  music$WASI and music$Y4_Total_AccStandard
## t = -1.5745, df = 17, p-value = 0.1338
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6978707  0.1163155
## sample estimates:
##        cor 
## -0.3567409

How to report

"In the music group, WASI scores and flanker accuracy were not significantly correlated, r(16[this is your n]) = -0.36, p > 0.05.

Thats great, but not really that helpful.

Lots at Once

ALISON HELP! how do we get the p values and stuff in a nice little tables and stuff like that
### Adding a Co-Variate (partial correlation)

ALISON HELP!

Correcting for Multiple Comparisons

ALISON, HELP AGAIN!

Regression

please see the accompanying R notebook “Statistical Learnining.rmd”, curated by Priscilla. In the meantime, here are a few things.

Simple regression

## 
## Call:
## lm(formula = WASI ~ AccStandard, data = flanker_long)
## 
## Coefficients:
## (Intercept)  AccStandard  
##     101.275        3.842
## 
## Call:
## lm(formula = WASI ~ AccStandard, data = flanker_long)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -49.116 -12.845   0.489   9.406  51.545 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  101.275      2.139  47.352   <2e-16 ***
## AccStandard    3.842      3.071   1.251    0.212    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.24 on 421 degrees of freedom
## Multiple R-squared:  0.003703,   Adjusted R-squared:  0.001337 
## F-statistic: 1.565 on 1 and 421 DF,  p-value: 0.2116

What does the Multiple R-squared stand for?

In general, R squared is the percentage of your response variable variation explained by the model, or how close data are to a regression line. Measures the amount of variation in the response variable that can be explained by the predictor variables. When you add predictors to your model,the multiple Rsquared will always increase, as a predictor will always explain some portion of the variance.

Adjusted Rsquared controls against this increase, and adds penalties for the number of predictors in the model.Therefore it shows a balance between the most parsimonious model,and the best fitting model. Generally, if you have a large difference between your multiple and your adjusted Rsquared that indicates you may have overfit your model.

how do we interpret the slope for height? if WASI increases by 1,accuracy increases by 3.532

do we reject null hypothesis?

No.

How do we interpret the intercept? Is the intercept meaningful in this case? When a predictor variable does not have a meaningful 0, it is often preferable to center the predictor variable around its mean.

when WASI is 0, acc is 101.557. WASI will never be zero here though. or.. very unlikely. so it doesnt rly mean antyhing

We can center it around the mean

## 
## Call:
## lm(formula = wasi2 ~ AccStandard, data = flanker_long)
## 
## Coefficients:
## (Intercept)  AccStandard  
##      -2.406        3.842
## 
## Call:
## lm(formula = wasi2 ~ AccStandard, data = flanker_long)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -49.116 -12.845   0.489   9.406  51.545 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   -2.406      2.139  -1.125    0.261
## AccStandard    3.842      3.071   1.251    0.212
## 
## Residual standard error: 19.24 on 421 degrees of freedom
## Multiple R-squared:  0.003703,   Adjusted R-squared:  0.001337 
## F-statistic: 1.565 on 1 and 421 DF,  p-value: 0.2116

Confidence interval

##                 2.5 %   97.5 %
## (Intercept) -6.609924 1.798107
## AccStandard -2.194658 9.877683

Simple regression with categorical variables

basically this means that the reference group is the first level. the intercept is the first group. the second one is comparing x level to the reference group.

You have to be careful about what you choose as your reference group. Sometimes it might not even make sense to have a reference group (for example, if you’re comparing genders etc.) You might just want to use the largest group as the reference group in that case. But, something to be aware of.

## 
## Call:
## lm(formula = AccStandard ~ Group, data = flanker_long)
## 
## Coefficients:
##  (Intercept)    GroupSport  GroupControl  
##       0.7160       -0.1205       -0.1507
## 
## Call:
## lm(formula = AccStandard ~ Group, data = flanker_long)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.70292 -0.23952  0.06321  0.27438  0.43463 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.71602    0.02488  28.776  < 2e-16 ***
## GroupSport   -0.12045    0.03577  -3.367 0.000829 ***
## GroupControl -0.15065    0.03519  -4.281 2.31e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2986 on 420 degrees of freedom
## Multiple R-squared:  0.04638,    Adjusted R-squared:  0.04184 
## F-statistic: 10.21 on 2 and 420 DF,  p-value: 4.663e-05

Linear Mixed Effects Model

This is a way of computing anova-like stats based in regression methods. It allows for baseline variability, which is pretty great.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: AccStandard ~ Group * Year + (1 | ID)
##    Data: flanker_long
## 
## REML criterion at convergence: 172.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9280 -0.7791  0.1746  0.7776  1.9086 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  ID       (Intercept) 0.0001739 0.01319 
##  Residual             0.0815577 0.28558 
## Number of obs: 423, groups:  ID, 47
## 
## Fixed effects:
##                      Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)           0.72857    0.04135 264.83967  17.619  < 2e-16 ***
## GroupSport           -0.19581    0.05945 264.83967  -3.294  0.00112 ** 
## GroupControl         -0.27478    0.05848 264.83967  -4.699 4.21e-06 ***
## YearY4               -0.16458    0.05829 369.99930  -2.823  0.00501 ** 
## YearY5                0.12694    0.05829 369.99930   2.178  0.03007 *  
## GroupSport:YearY4     0.25540    0.08380 369.99931   3.048  0.00247 ** 
## GroupControl:YearY4   0.26764    0.08244 369.99930   3.246  0.00128 ** 
## GroupSport:YearY5    -0.02932    0.08380 369.99931  -0.350  0.72666    
## GroupControl:YearY5   0.10476    0.08244 369.99930   1.271  0.20464    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) GrpSpr GrpCnt YearY4 YearY5 GS:YY4 GC:YY4 GS:YY5
## GroupSport  -0.696                                                 
## GroupContrl -0.707  0.492                                          
## YearY4      -0.705  0.490  0.498                                   
## YearY5      -0.705  0.490  0.498  0.500                            
## GrpSprt:YY4  0.490 -0.705 -0.347 -0.696 -0.348                     
## GrpCntr:YY4  0.498 -0.347 -0.705 -0.707 -0.354  0.492              
## GrpSprt:YY5  0.490 -0.705 -0.347 -0.348 -0.696  0.500  0.246       
## GrpCntr:YY5  0.498 -0.347 -0.705 -0.354 -0.707  0.246  0.500  0.492

assumptions

if interaction effect… you can look@ groups individually

Robust Linear Mixed Effects model

This model DOES NOT rely on standard assumptions as lmer does. Documentation and simulation studies can be found here, from Koller 2016.

This package is very new. bootmer, MCMC, and Kenward-Roger approxmitions for degrees of freedom and p values are not yet configured and cannot be computed on any other program that i am aware of. But! I can write my own function! working on that.

## Robust linear mixed model fit by DAStau 
## Formula: AccStandard ~ Group * Year + (1 | ID) 
##    Data: flanker_long 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8172 -0.7329  0.1446  0.6869  1.8206 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 0.00000  0.000   
##  Residual             0.09427  0.307   
## Number of obs: 423, groups: ID, 47
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          0.74271    0.04545  16.340
## GroupSport          -0.20671    0.06534  -3.163
## GroupControl        -0.30292    0.06428  -4.712
## YearY4              -0.16726    0.06428  -2.602
## YearY5               0.13849    0.06428   2.154
## GroupSport:YearY4    0.27355    0.09241   2.960
## GroupControl:YearY4  0.29206    0.09091   3.213
## GroupSport:YearY5   -0.03246    0.09241  -0.351
## GroupControl:YearY5  0.12181    0.09091   1.340
## 
## Correlation of Fixed Effects:
##             (Intr) GrpSpr GrpCnt YearY4 YearY5 GS:YY4 GC:YY4 GS:YY5
## GroupSport  -0.696                                                 
## GroupContrl -0.707  0.492                                          
## YearY4      -0.707  0.492  0.500                                   
## YearY5      -0.707  0.492  0.500  0.500                            
## GrpSprt:YY4  0.492 -0.707 -0.348 -0.696 -0.348                     
## GrpCntr:YY4  0.500 -0.348 -0.707 -0.707 -0.354  0.492              
## GrpSprt:YY5  0.492 -0.707 -0.348 -0.348 -0.696  0.500  0.246       
## GrpCntr:YY5  0.500 -0.348 -0.707 -0.354 -0.707  0.246  0.500  0.492
## 
## Robustness weights for the residuals: 
##  345 weights are ~= 1. The remaining 78 ones are summarized as
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.477   0.753   0.872   0.857   0.968   0.998 
## 
## Robustness weights for the random effects: 
##  All 47 weights are ~= 1.
## 
## Rho functions used for fitting:
##   Residuals:
##     eff: smoothed Huber (k = 1.345, s = 10) 
##     sig: smoothed Huber, Proposal II (k = 1.345, s = 10) 
##   Random Effects, variance component 1 (ID):
##     eff: smoothed Huber (k = 1.345, s = 10) 
##     vcp: smoothed Huber, Proposal II (k = 1.345, s = 10)

Growth Model

With more than 2 time points, we can incorporate a robust latent growth analysis, so that each participant is allowed an individual trajectory over time. Field and Wilcox, 2017. Incorporating the growth model also gives us p values and confidence intervals!

I am still learning how to use this!! Come back for complete notes.

Factor Analysis

Coming soon!

Mediation analysis

Also coming soon!

Non-Parametric Tests

NOTE: this is from an earlier version of this notebook. Feel free to study the below non-parametric tests. But again, these are still considered “classic methods” and have considerable pitfalls. Please use caution. I recommend deep-diving into Rand’s book from the link above. Again, this is not a stats course.

ALL of the above tests make several assumptions about the data used:

1. Normality: data have a normal distribution
2. Homogeneity of Variance: variances are the same between groups
3. Linearity: data have a linear relationship
4. Independence: data are independent

However, this may not be true for all data you may encounter. Below are some commonly-used non-parametric tests for those instances that the assumptions above are violated.

Wilcoxon Signed-Rank Test

This test is a rank-based test. So, instead of raw scores, data is transformed into ranks. It begins by calculating the difference between “observed values” and the “default value”. This if for data that is INTERVAL or RATIO, not ORDINAL. This is additionally only for one-sample data. Note: the default value is something like a middle, neutral value, or a previously published value.

The null hypothesis is that the data is symmetric around the “default value”, and thus two samples are identical populations.

Parametric equivalent: paired t-test

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  digit_span$age_1 and digit_span$age_2
## V = 0, p-value = 1.19e-12
## alternative hypothesis: true location shift is not equal to 0
## [1] 82
## [1] 94

What does it mean?

V corresponds to the sum of the ranks assigned to the differences.

I’m not really sure how to interpret this. I don’t understand this test well enough. To be learned!!!

How to report

“A Wilcoxen Signed-Ranks test indicated that age at year 2 (Mdn = 94) was greater than age at year 1 (Mdn = 82), V = 0, p < 0.001.”

Krustkal-Wallis Test

The parametric equivalent to this one is a 1-way ANOVA

Null hypothesis is that groups are equal on some dependent variable.

We will use our flanker data to explore if groups differ on accuracy

## 
##  Kruskal-Wallis rank sum test
## 
## data:  AccStandard by Group
## Kruskal-Wallis chi-squared = 24.274, df = 2, p-value = 5.359e-06

How to report

“There was a signifcant difference between groups on accuracy (H(2) = 24.274, p < 0.001)”

Friedman Test

Parametric equivalent: Repeated Measures ANOVA

## 
##  Friedman rank sum test
## 
## data:  cbind(flanker_long$WASI, flanker_long$Group, flanker_long$Year)
## Friedman chi-squared = 692.18, df = 2, p-value < 2.2e-16

What does it mean?

Again, I don’t really understand how this one works. To be learned!

How to report

“A non-parametric Friedman test of differences among repeated measures was conducted and rendered a Chi-square value of 692.18, which was significant (p < 0.001).”

Data Visualization

Now, to the fun part. R is SO SO GREAT for doing data visualization. It takes a little getting used to, but then, your options are literally limitless. Below is some of what you can do but, as always, there is so much more we have yet to learn.

Let’s start with tables, since we’ve been doing so much with data above.

Tables

kables (HTML)

Kable is used to make rly pretty tables in html format. These look nice on knitted r markdowns like this one. However, I don’t think you can put them in grobs (below), so we will continue with grob-style tables later.

TBH……. the document below does a better job of explaining kable than I ever will. So… here is the link!

but, in the meantime, here is a simple example:

Group Condition Year N mean sd se
Music Congruent Y3 16 683.4739 131.25684 32.81421
Music Congruent Y4 16 603.9375 125.82827 31.45707
Music Congruent Y5 16 577.2344 127.44079 31.86020
Music Incongruent Y3 16 750.6588 102.77855 25.69464
Music Incongruent Y4 16 651.1042 145.37000 36.34250
Music Incongruent Y5 16 619.1771 133.76510 33.44127
Music Neutral Y3 16 723.4173 119.32385 29.83096
Music Neutral Y4 16 547.7054 125.41262 31.35315
Music Neutral Y5 16 503.1964 121.37974 30.34494
Sport Congruent Y3 15 689.5363 139.11747 35.91998
Sport Congruent Y4 15 695.0056 117.13139 30.24319
Sport Congruent Y5 15 624.7333 111.77598 28.86043
Sport Incongruent Y3 15 740.8833 91.40338 23.60025
Sport Incongruent Y4 15 749.3544 179.26647 46.28640
Sport Incongruent Y5 15 668.4333 100.05226 25.83338
Sport Neutral Y3 15 771.7964 109.38517 28.24313
Sport Neutral Y4 15 609.0857 110.96122 28.65006
Sport Neutral Y5 15 525.0952 78.39185 20.24069
Control Congruent Y3 16 667.3185 105.34082 26.33520
Control Congruent Y4 16 599.5625 87.32704 21.83176
Control Congruent Y5 16 533.2031 64.51687 16.12922
Control Incongruent Y3 16 724.1630 99.52525 24.88131
Control Incongruent Y4 16 639.4479 132.49195 33.12299
Control Incongruent Y5 16 565.7188 80.83267 20.20817
Control Neutral Y3 16 693.0483 119.40328 29.85082
Control Neutral Y4 16 545.7946 109.17146 27.29287
Control Neutral Y5 16 470.0804 91.85918 22.96479

again, you can do all sorts of cool stuff like grouping by a variable, coloring, footnotes, etc etc etc. look @ that little page. it is great.

Stargazer

This is fun. But really, it is better when you are making sweave + pdf type documents.

Summary
Group Condition Year N mean sd se
Music Congruent Y3 16 683.474 131.257 32.814
Music Congruent Y4 16 603.938 125.828 31.457
Music Congruent Y5 16 577.234 127.441 31.860
Music Incongruent Y3 16 750.659 102.779 25.695
Music Incongruent Y4 16 651.104 145.370 36.342
Music Incongruent Y5 16 619.177 133.765 33.441
Music Neutral Y3 16 723.417 119.324 29.831
Music Neutral Y4 16 547.705 125.413 31.353
Music Neutral Y5 16 503.196 121.380 30.345
Sport Congruent Y3 15 689.536 139.117 35.920
Sport Congruent Y4 15 695.006 117.131 30.243
Sport Congruent Y5 15 624.733 111.776 28.860
Sport Incongruent Y3 15 740.883 91.403 23.600
Sport Incongruent Y4 15 749.354 179.266 46.286
Sport Incongruent Y5 15 668.433 100.052 25.833
Sport Neutral Y3 15 771.796 109.385 28.243
Sport Neutral Y4 15 609.086 110.961 28.650
Sport Neutral Y5 15 525.095 78.392 20.241
Control Congruent Y3 16 667.319 105.341 26.335
Control Congruent Y4 16 599.562 87.327 21.832
Control Congruent Y5 16 533.203 64.517 16.129
Control Incongruent Y3 16 724.163 99.525 24.881
Control Incongruent Y4 16 639.448 132.492 33.123
Control Incongruent Y5 16 565.719 80.833 20.208
Control Neutral Y3 16 693.048 119.403 29.851
Control Neutral Y4 16 545.795 109.171 27.293
Control Neutral Y5 16 470.080 91.859 22.965

APA Tables

makes a cute little document that is formatted perfectly for APA!

There is a lot more you can do, but here are a few simple things.

For the purposes of demonstration, I have included the images of what the output looks like. (they are also .doc files in the directory)

Correlation table!!!!

Regression table!!

1 way ANOVA table!!

Descriptive stats to follow that up with.

To do a table from EZanova use “apa.ezANOVA.table”

Graphs and Charts

Histograms and Density

Histograms are like bar graphs, but they connect. They are good for looking at things like distributions, normality, etc. We are going to use them to look at WASI scores, here.

You can use wide format data for these.

Let’s start by looking just at WASI Scores Overall


Another way, with colors


With groups

Bar Graphs

Categorical

Let’s do a chart for gender. Basic basic

With percentages

We are going to use fake data for this


Non-Categorical


Great. Now, let’s look at some between conditions.


And… between years and conditions


Reeeally quickly, lets just look at some paired data, and how we switched left and right, as appropriate. This is cortical thickness data, with Left & Right hemispheres. This is actually a summary spreadsheet already, saved as an xlsx.

## New names:
## * `` -> ...1

Another type: This is Assal’s preferred scheme

Note the manual changes made

pd <- position_dodge(0.6)
ggplot(flankersum_acc, aes(x = Group, y = mean, fill =Condition)) +
  geom_bar(position = position_dodge(0.6), width = .9, stat = "identity", alpha = 0.5)+ #alpha level changes transparency 
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se), colour="white", width=.1, position=pd)+
  ylab("Accuracy") +
  ggtitle("Flanker Fish") +
  scale_x_discrete(breaks = c("1","2","3"),
                   labels = c("Music", "Sport","Control"))+ #Rename your ticks!
  scale_fill_manual(name = "Condition",
                      breaks = c("Congruent","Incongruent", "Neutral"), #what they ARE called 
                      labels = c("Congruent!", "Incongruent!", "Neutral!"), #what you WANT THEM TO BE called
                      values = c("yellow","cyan", "pink"))+ #choose your own colors!
  theme_bw()+
  theme(axis.line = element_line(size = 1, colour = "black"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(), panel.background = element_blank(),
        text = element_text(family = "Times", colour = "white"),
        plot.background = element_rect(fill = "black", colour = "black"),
        legend.background = element_blank(),
        legend.box.background = element_blank(),
        legend.key = element_rect(fill = "black"),
        plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(colour = "white", size = 10, angle = 45), #angle makes an angle!
        axis.text.y = element_text(colour = "white", size = 10))+
  facet_grid(. ~Year, labeller = as_labeller(c("Y3" = "Year 3", "Y4" = "Year 4", "Y5" = "Year 5")), scales = "free") + #change names of facets
  theme(strip.background = element_rect(fill = "gray28"), #change the colors of your facet boxes 
        strip.text = element_text(colour = "white")) #change text of facets 

Line Graphs

Now, onto line graphs! yay! Line graphs are really great at showing how things change over time.

Let’s start with a simple line graph. All groups, over time

Scatter Plots

Groups by color

## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).


Gradient

## Warning: Removed 1 rows containing missing values (geom_point).

That’s cool, but they’re all stacked! Let’s add a jitter

## Warning: Removed 1 rows containing missing values (geom_point).

Correlation Matrix

Because, as you can see above, we have terrible data for correlations, we are using mtcars ( a built in data set)

add more information

Great. Now let’s put some numbers in it.

Grobs

Grobs are this cool way of displaying both tables and graphs together.

Really, you can display whatever you want together. Even pictures, I think. I’m pretty sure this works best for ggplots, though. I could be wrong.

Here are two graphs side by side

switch them up

add more.

With a table

(obviously the arrangement below doesn’t make any sense at all, but the sentiment is there.)